Introduction to Deep Learning

What is Deep Learning?

Deep learning can be understood as a set of algorithms that were developed to train artificial neural networks with many layers most efficiently.

Artificial neurons represent the building blocks of the multilayer artificial neural networks. The basic concept behind artificial neural networks was built upon hypotheses and models of how the human brain works to solve complex problem tasks. Although artificial neural networks have gained a lot of popularity in recent years, early studies of neural networks go back to the 1940s when Warren McCulloch and Walter Pitt first described how neurons could work.

However, in the decades that followed the first implementation of the McCulloch- Pitt neuron model—Rosenblatt's perceptron in the 1950s, many researchers and machine learning practitioners slowly began to lose interest in neural networks since no one had a good solution for training a neural network with multiple layers. Eventually, interest in neural networks was rekindled in 1986 when D.E. Rumelhart, G.E. Hinton, and R.J. Williams were involved in the (re)discovery and popularization of the backpropagation algorithm to train neural networks more efficiently (Learning representations by backpropagating errors, David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams, Nature, 323 (6088): 533–536, 1986)

Artificial neuron

An artifical neuron is the basic unit of a neural network. It calculates the weighted sum of its inputs and then applies an activation function to normalize the sum. The activation functions can be linear or nonlinear. Also, there are weights associated with each input of a neuron. These are the parameters which the network has to learn during the training phase.

A schematic diagram of a neuron is given below.

Activation function

The activation function is used as a decision making body at the output of a neuron. The neuron learns Linear or Non- linear decision boundaries based on the activation function. It also has a normalizing effect on the neuron output which prevents the output of neurons after several layers to become very large, due to the cascading effect. There are three most widely used activation functions.

  1. Sigmoid It maps the input ( x axis ) to values between 0 and 1.

  1. Tanh It is similar to the sigmoid function butmaps the input to values between -1 and 1.

  1. Rectified Linear Unit (ReLU) It allows only positive values to pass through it. The negative values are mapped to zero.

Single-layer neural perceptron

The simplest kind of neural network is a single-layer perceptron network, which consists of a single layer of output nodes; the inputs are fed directly to the outputs via a series of weights. In this way it can be considered the simplest kind of feed-forward network.

The sum of the products of the weights and the inputs is calculated in each node, and if the value is above some threshold (typically 0) the neuron fires and takes the activated value (typically 1); otherwise it takes the deactivated value (typically -1). Neurons with this kind of activation function are also called artificial neurons or linear threshold units. In the literature the term perceptron often refers to networks consisting of just one of these units. A similar neuron was described by Warren McCulloch and Walter Pitts in the 1940s.

Multi-layer neural perceptron

A multilayer perceptron (MLP) is a class of feedforward artificial neural network. An MLP consists of at least three layers of nodes. Except for the input nodes, each node is a neuron that uses a nonlinear activation function. Its multiple layers and non-linear activation distinguish MLP from a linear perceptron. It can distinguish data that is not linearly separable.

Feedforward neural network

Feedforward neural networks are the most common networks used in Deep Learning.

In this type of architecture, a connection between two nodes is only permitted from nodes in layer i to nodes in layer i + 1 (hence the term feedforward; there are no backwards or inter-layer connections allowed).

Furthermore, the nodes in layer i are fully connected to the nodes in layer i + 1. This implies that every node in layer i connects to every node in layer i + 1. For example, in the figure above, there are a total of 2 x 3 = 6 connections between layer 0 and layer 1 — this is where the term “fully connected” or “FC” for short, comes from.

Building a feedforward network with Keras

We are going to use the Kaggle Dogs vs. Cats classification challenge and create a Feedforward network to classify images between dogs and cats.

The goal of this challenge is to correctly classify whether a given image contains a dog or a cat.


In [1]:
# import the necessary packages
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Activation
from keras.optimizers import SGD
from keras.layers import Dense
from keras.utils import np_utils
from imutils import paths
import numpy as np
import argparse
import cv2
import os


Using TensorFlow backend.

In [2]:
def image_to_feature_vector(image, size=(32, 32)):
    # resize the image to a fixed size, then flatten the image into
    # a list of raw pixel intensities
    return cv2.resize(image, size).flatten()

In [3]:
# initialize the data matrix and labels list
data = []
labels = []
train_path='/Volumes/Data/Computer_Vision/kaggle_dogs_cats/train'

imagePaths = list(paths.list_images(train_path))
# loop over the input images
for (i, imagePath) in enumerate(imagePaths):
    # load the image and extract the class label (assuming that our
    # path as the format: /path/to/dataset/{class}.{image_num}.jpg
    image = cv2.imread(imagePath)
    label = imagePath.split(os.path.sep)[-1].split(".")[0]
    
    # construct a feature vector raw pixel intensities, then update
    # the data matrix and labels list
    features = image_to_feature_vector(image)
    data.append(features)
    labels.append(label)
    
    # show an update every 1,000 images
    if i > 0 and i % 1000 == 0:
        print("[INFO] processed {}/{}".format(i, len(imagePaths)))


[INFO] processed 1000/25000
[INFO] processed 2000/25000
[INFO] processed 3000/25000
[INFO] processed 4000/25000
[INFO] processed 5000/25000
[INFO] processed 6000/25000
[INFO] processed 7000/25000
[INFO] processed 8000/25000
[INFO] processed 9000/25000
[INFO] processed 10000/25000
[INFO] processed 11000/25000
[INFO] processed 12000/25000
[INFO] processed 13000/25000
[INFO] processed 14000/25000
[INFO] processed 15000/25000
[INFO] processed 16000/25000
[INFO] processed 17000/25000
[INFO] processed 18000/25000
[INFO] processed 19000/25000
[INFO] processed 20000/25000
[INFO] processed 21000/25000
[INFO] processed 22000/25000
[INFO] processed 23000/25000
[INFO] processed 24000/25000

In [4]:
# encode the labels, converting them from strings to integers
le = LabelEncoder()
labels = le.fit_transform(labels)
 
# scale the input image pixels to the range [0, 1], then transform
# the labels into vectors in the range [0, num_classes] -- this
# generates a vector for each label where the index of the label
# is set to `1` and all other entries to `0`
data = np.array(data) / 255.0
labels = np_utils.to_categorical(labels, 2)
 
# partition the data into training and testing splits, using 75%
# of the data for training and the remaining 25% for testing
print("[INFO] constructing training/testing split...")
(trainData, testData, trainLabels, testLabels) = train_test_split(
    data, labels, test_size=0.25, random_state=42)


[INFO] constructing training/testing split...

In [5]:
# define the architecture of the network
model = Sequential()
model.add(Dense(768, input_dim=3072, kernel_initializer="uniform", activation="relu"))
model.add(Dense(384, kernel_initializer="uniform", activation="relu"))
model.add(Dense(2))
model.add(Activation("softmax"))

In [6]:
# train the model using SGD
print("[INFO] compiling model...")
sgd = SGD(lr=0.01)
model.compile(loss="binary_crossentropy", optimizer=sgd, metrics=["accuracy"])
model.fit(trainData, trainLabels, epochs=50, batch_size=128, verbose=1)


[INFO] compiling model...
/Users/rmunoz/.pyenv/versions/3.6.1/envs/keras/lib/python3.6/site-packages/keras/models.py:837: UserWarning: The `nb_epoch` argument in `fit` has been renamed `epochs`.
  warnings.warn('The `nb_epoch` argument in `fit` '
Epoch 1/50
18750/18750 [==============================] - 11s - loss: 0.6853 - acc: 0.5629    
Epoch 2/50
18750/18750 [==============================] - 9s - loss: 0.6652 - acc: 0.5975     
Epoch 3/50
18750/18750 [==============================] - 11s - loss: 0.6539 - acc: 0.6129    
Epoch 4/50
18750/18750 [==============================] - 9s - loss: 0.6443 - acc: 0.6275     
Epoch 5/50
18750/18750 [==============================] - 15s - loss: 0.6399 - acc: 0.6361    
Epoch 6/50
18750/18750 [==============================] - 18s - loss: 0.6369 - acc: 0.6335    
Epoch 7/50
18750/18750 [==============================] - 14s - loss: 0.6293 - acc: 0.6451    
Epoch 8/50
18750/18750 [==============================] - 10s - loss: 0.6270 - acc: 0.6478    
Epoch 9/50
18750/18750 [==============================] - 10s - loss: 0.6222 - acc: 0.6538    
Epoch 10/50
18750/18750 [==============================] - 13s - loss: 0.6184 - acc: 0.6559    
Epoch 11/50
18750/18750 [==============================] - 21s - loss: 0.6143 - acc: 0.6625    
Epoch 12/50
18750/18750 [==============================] - 17s - loss: 0.6096 - acc: 0.6667    
Epoch 13/50
18750/18750 [==============================] - 15s - loss: 0.6087 - acc: 0.6655    
Epoch 14/50
18750/18750 [==============================] - 15s - loss: 0.6064 - acc: 0.6691    
Epoch 15/50
18750/18750 [==============================] - 16s - loss: 0.6008 - acc: 0.6756    
Epoch 16/50
18750/18750 [==============================] - 10s - loss: 0.5974 - acc: 0.6764    
Epoch 17/50
18750/18750 [==============================] - 10s - loss: 0.5932 - acc: 0.6828    
Epoch 18/50
18750/18750 [==============================] - 11s - loss: 0.5931 - acc: 0.6836    
Epoch 19/50
18750/18750 [==============================] - 11s - loss: 0.5904 - acc: 0.6851    
Epoch 20/50
18750/18750 [==============================] - 13s - loss: 0.5844 - acc: 0.6873    
Epoch 21/50
18750/18750 [==============================] - 11s - loss: 0.5820 - acc: 0.6921    
Epoch 22/50
18750/18750 [==============================] - 10s - loss: 0.5803 - acc: 0.6929    
Epoch 23/50
18750/18750 [==============================] - 11s - loss: 0.5754 - acc: 0.7022    
Epoch 24/50
18750/18750 [==============================] - 10s - loss: 0.5716 - acc: 0.6991    
Epoch 25/50
18750/18750 [==============================] - 10s - loss: 0.5699 - acc: 0.7061    
Epoch 26/50
18750/18750 [==============================] - 11s - loss: 0.5667 - acc: 0.7051    
Epoch 27/50
18750/18750 [==============================] - 12s - loss: 0.5624 - acc: 0.7117    
Epoch 28/50
18750/18750 [==============================] - 11s - loss: 0.5603 - acc: 0.7130    
Epoch 29/50
18750/18750 [==============================] - 11s - loss: 0.5554 - acc: 0.7167    
Epoch 30/50
18750/18750 [==============================] - 11s - loss: 0.5519 - acc: 0.7209    
Epoch 31/50
18750/18750 [==============================] - 10s - loss: 0.5521 - acc: 0.7211    
Epoch 32/50
18750/18750 [==============================] - 11s - loss: 0.5499 - acc: 0.7216    
Epoch 33/50
18750/18750 [==============================] - 10s - loss: 0.5419 - acc: 0.7327    
Epoch 34/50
18750/18750 [==============================] - 10s - loss: 0.5391 - acc: 0.7299    
Epoch 35/50
18750/18750 [==============================] - 10s - loss: 0.5368 - acc: 0.7339    
Epoch 36/50
18750/18750 [==============================] - 10s - loss: 0.5367 - acc: 0.7277    - ETA: 0s - loss: 0.5364 
Epoch 37/50
18750/18750 [==============================] - 10s - loss: 0.5290 - acc: 0.7373    
Epoch 38/50
18750/18750 [==============================] - 9s - loss: 0.5239 - acc: 0.7424     
Epoch 39/50
18750/18750 [==============================] - 11s - loss: 0.5248 - acc: 0.7394    
Epoch 40/50
18750/18750 [==============================] - 10s - loss: 0.5205 - acc: 0.7416    
Epoch 41/50
18750/18750 [==============================] - 11s - loss: 0.5206 - acc: 0.7459    
Epoch 42/50
18750/18750 [==============================] - 11s - loss: 0.5104 - acc: 0.7561    
Epoch 43/50
18750/18750 [==============================] - 11s - loss: 0.5144 - acc: 0.7502    
Epoch 44/50
18750/18750 [==============================] - 11s - loss: 0.5077 - acc: 0.7524    
Epoch 45/50
18750/18750 [==============================] - 11s - loss: 0.5033 - acc: 0.7604    
Epoch 46/50
18750/18750 [==============================] - 11s - loss: 0.5050 - acc: 0.7580    
Epoch 47/50
18750/18750 [==============================] - 9s - loss: 0.4947 - acc: 0.7645     
Epoch 48/50
18750/18750 [==============================] - 9s - loss: 0.4958 - acc: 0.7603     
Epoch 49/50
18750/18750 [==============================] - 11s - loss: 0.4852 - acc: 0.7683    
Epoch 50/50
18750/18750 [==============================] - 10s - loss: 0.4899 - acc: 0.7675    
Out[6]:
<keras.callbacks.History at 0x11cca1ac8>

In [7]:
# show the accuracy on the testing set
print("[INFO] evaluating on testing set...")
(loss, accuracy) = model.evaluate(testData, testLabels, batch_size=128, verbose=1)
print("[INFO] loss={:.4f}, accuracy: {:.4f}%".format(loss, accuracy * 100))


[INFO] evaluating on testing set...
6250/6250 [==============================] - 1s     
[INFO] loss=0.6906, accuracy: 62.4480%

Accuracy

On a Titan X GPU, the entire process of feature extraction, training the neural network, and evaluation took a total of 1m 15s with each epoch taking less than 0 seconds to complete.

At the end of the 50th epoch, we see that we are getting ~76% accuracy on the training data and 67% accuracy on the testing data.


In [ ]: